home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
The Best of MacTutor - S…e Code for Volumes 1 to 5
/
The Best of MacTutor - Source Code for Volume 1-5 (Wayzata Technology)(6031)(1990).bin
/
Source Code
/
#05 (Dec85-Jan86)
/
pascal
/
Mondrian & dissolve source 1-13
/
dissbits.asm
next >
Wrap
Assembly Source File
|
1990-01-01
|
43KB
|
981 lines
;
; procedure dissBits (srcB, destB: bitMap; srcR, dstR: rect); external;
;
; mike morton
; release: 15 september 1985
;
; this is the version 5.2.
;
; differences from version 5 to 5.2 are:
; bug fixed in the full-screen-width case (first bit done right)
; magic-constant table is compacted, stored in bytes instead of longs
; some small changes to save space, plus the table saves about 100 bytes
; tested by a program on several million cases; this helped find...
; yet more bugs in log2 routine fixed ("bitwidth" routine added)
; differences from version 4 are:
; nasty bug fixed in copying small rectangles
; new address for correspondence
; slightly more detailed notes on calling from C
; various miscellaneous comment changes
; differences from version 3 are:
; bugs in the log2 routine fixed
; general case sped up about five percent
; certain cases (e.g., the full screen) sped up over fifty percent
; the time to dissolve is not directly related to the size of the rect
; differences between version 2 and version 3 are:
; documentation improved and neatened
; log2 routine rewritten
;
; comments and suggestions are, of course, welcome.
;
; ******************************************************************************
; * *
; * copyright 1984, 1985 by michael s. morton *
; * permission to publish and distribute granted to MacTutor *
; * please see details below on using, copying and changing this source. *
; * *
; ******************************************************************************
;
; what this routine does:
; ----------------------
;
; dissBits is like copyBits: it moves one rect to another, in their respective
; bitMaps. it doesn't implement the modes of copyBits, nor clipping to a
; region. what it DOES do is copy the bits in a pseudo-random order, giving
; the appearance of "dissolving" from one image to another. the dissolve is
; rapid: the entire screen will dissolve in under four seconds. (note: smaller
; areas may be SLOWER to dissolve -- see below.)
;
; copyBits pays attention to the current clipping. this routine doesn't.
;
; other likely differences from copyBits:
; o the rectangles must have the same extents (not necessarily the same
; lrbt). if they are not, the routine will return -- doing nothing!
; no stretching copy is done as copyBits would.
; o the cursor is hidden during the dissolve, since drawing is done without
; quickdraw calls. the cursor reappears when the drawing is finished.
; for an odd effect, change it not to hide the cursor; is this how bill
; atkinson thought of the spray can in MacPaint?
; o copyBits may be smart enough to deal with overlapping areas of memory.
; this routine certainly isn't.
; o because this routine is desperate for speed, it steals the A5 (globals)
; register, uses it in the central loop, then restores it before
; returning. if you have vertical retrace tasks which run during the
; dissolve, they'll wake up with the bogus A5. since the ROM pulls this
; trick too, i feel this isn't a bug in MY code, but it IS more likely
; to expose bugs in VBL tasks. to correct the problem, have your task
; load A5 from the low RAM location "currentA5" immediately upon starting
; up, then restore its caller's A5 before returning.
;
; you should know a few implementation details which may help:
; o copying from a dark area (lots of 1 bits) is slower than from a light
; area. but just barely (a few per cent).
; o there is no way to use this to randomly invert a rectangle. instead,
; copyBits it elsewhere, invert it, and dissBits it back into place.
; o there is also no way to slow the dissolve of a small area. to do this,
; copy a large area in which the only difference is the area to change.
; o if you fade in a solid area, you're likely to see patterns, since the
; random numbers are so cheesy. don't do this; fade in nifty patterns
; which will distract your viewers.
; o very small areas (less than 2 pixels in either dimension) are actually
; done with a call to the real copyBits routine, since the pseudo-random
; sequence generator falls apart under those conditions.
;
; a close relative of this routine is "dissBytes", which (as you might guess)
; copies a byte at a time, which is really fast (the whole screen in .1 or .2
; seconds). it works only for certain rectangles, though.
;
; sample calling code:
; -------------------
;
; this is an excerpt from how a prerelease of DarTerminal called this routine.
; note the clever use of "paintbehind". this took about 3 seconds to dissolve
; onto the screen.
;
;var rg: rgnhandle; (* window to copy into *)
; aport: grafptr; (* port to draw into *)
; bits: bitmap; (* new bitmap for that port *)
; r: rect; (* rectangle to draw into *)
; pat: pattern;
; text: packed array[1..37] of char;
; ...
; aport := grafptr(newptr(sizeof(grafport))); (* get a port *)
; openport(aport); (* make it current *)
;
; r := theport^.portbits.bounds; (* start with whole screen *)
; insetrect(r,100,100); (* get window-size rect *)
; (* note that the number of bytes per row must be even! *)
; bits.rowbytes := (((r.right-r.left)+15) div 16) * 2; (* find bytes/row *)
; bits.baseaddr := qdptr(newptr(bits.rowbytes*(r.bottom-r.top))); (* get space *)
; bits.bounds := r; (* boundary rect for bitmap *)
;
; setportbits(bits); (* make new bitmap current *)
;
; eraserect(r);
; textfont(london); textsize(18); textface([bold]);
; text := 'DarTerminal version -1.9 August 1984';
; textbox(@text,37,r,tejustcenter);
;
; dissbits(bits,screenport^.portbits,r,r); (* dissolve it in *)
;
; repeat until getnextevent(mdownmask+keydownmask,anevent); (* let 'em gawk *)
;
; rg := newrgn; (* get a region to clip with *)
; rectrgn(rg,r); (* as a rectangle *)
; paintbehind(windowpeek(frontwindow),rg); (* update this area of screen *)
;
; disposergn(rg);
; disposptr(ptr(bits.baseaddr));
; disposptr(ptr(aport));
;
;
; calling from languages other than pascal:
; ----------------------------------------
;
; this routine uses the standard Lisa Pascal calling sequence. to convert it to
; most C compilers, you'll probably just have to delete this instruction from
; near the end of the main routine:
; add.l #psize,SP ; unstack parameters
;
; i'd be very interested in hearing about successful uses of this routine from
; other languages.
;
; speed of the dissolve:
; ---------------------
;
; you need to pay attention to this section only if: (a) you want the dissolve
; to run as fast as it can OR (b) you do dissolves of various sizes and want
; them to take proportionate lengths of time.
;
; there are 3 levels of speed; the fastest possible one is chosen for you:
; (1) an ordinary dissolve will work when moving from any bitmap to any bitmap,
; including on the Lisa under MacWorks. this will dissolve at about 49
; microseconds per pixel. a rectangle one-quarter the size of the screen
; will dissolve in just over two seconds. the speed per pixel will vary
; slightly, and will be less if your rect extents are close to but less
; than powers of 2.
; (2) the dissolve will speed up if both the source and destination bitmaps have
; rowBytes fields which are powers of two. if you're copying to the screen
; on a mac, the rowBytes field already satisfies this. so, make your source
; bitmap the right width for a cheap speedup -- about 20% faster.
; (3) the fanciest level is intended for copying the whole screen. it'll paint
; it in about 3.4 seconds (19 microseconds per pixel). actually, painting
; any rectangle which is the full width of the screen will run at this
; speed, for what that's worth.
;
; duplication and use of this routine:
; -----------------------------------
;
; this is freeware. you're welcome to copy it and use it in programs. you're
; welcome to modify it, as long as you leave everything up until this section
; unchanged. i'd be very interested in seeing your changes, especially if you
; find ways to make the central loops faster. you're also welcome to port it
; to other machines/languages; i'd appreciate hearing about efforts to do this.
;
; this is "freeware"; please pay me if you use it. why?
;
; o if you have problems using it, i'll help you debug it.
; o i'll send you improved, debugged, faster versions.
; o i'll tell you about other products. this is the first thing i ever
; wrote for the Mac; wouldn't you like to see what else i've produced?
; send me some positive feedback!
;
; how much should you pay? my suggestion is:
; (cost of one copy of the program) * (log10 of number of copies sold)
; if the subroutine is an integral part of your program, double this amount.
; if it's a frill (e.g., you dissolve in your "About MacWhatever"), halve it.
;
; i find it hard to believe that any damages to you or anyone else could come
; from bugs in this routine. but, alas, whether or not you pay me, i can't be
; liable in any way for any problems in it.
;
; send comments, contributions, criticisms, or whatever to:
; mike morton
; INFOCOM
; 125 CambridgePark Dr.
; Cambridge, MA 02140
;
; if, for some reason, you only have a hard copy of this and would like a
; source on a diskette, please contact:
; MacTutor
; Source Code Disks
; P.O. Box 846
; Placentia, CA. 92670
; (714) 579-7700
;
;
; things to think about:
; ---------------------
;
; o clean up the register usage (as if i'll ever actually get around to this)
; o use a dynamically-built table to avoid the multiply instruction in the general case.
; this may not be a great idea since not everyone can afford that much space.
; o adapt the routine to do transfer modes. especially pattern modes, with no source.
; it'd run significantly faster doing a fill of just black or white! and patXor
; would be pretty cool too (see also the XOR idea below).
; o consider a front-end which does clipping (just like copyBits) by:
; - allocate yet another offscreen bitmap, whose bounds are the intersection of
; the destination rect with the bounding box of the destination's clipping rgn.
; - copy from the destination's bitmap into the temporary bitmap.
; - do a copyBits from the source to the temp, with the requested masking region.
; - dissolve from the temp to the destination, thus actually dissolving that whole
; rect, but changing only the clipped stuff.
; o a nifty way to speed up things would be to XOR the destination into the source [sic!].
; then the main loop doesn't have to copy a bit; it just tests if the bit is ON in the
; altered source bitmap and, if so, TOGGLES it in the destination. (i.e., it does a
; srcXor operation). to repair the source bitmap, a third XOR from the destination to
; the source should do it [any old-timer will recognize this triple XOR as the best
; way to swap two registers.] the trick is making the invisible XOR run so fast that
; the time to do it is less than the savings in the visible part. or perhaps the batch
; XORs could be done in spare-time...
; o implement a partial dissolve. this would let alex animate star trek-style teleporting.
; there are a lot of ways to do this. the easiest might be to include a counter in the
; loop ("a register! my kingdom for a register!"). alternately, we could assume that
; the list of starting and ending points could be limited, allowing a table-based system.
; this is something worth looking into, but could be real difficult.
; o look at rearranging instructions so CPU-intensive ones aren't grouped, allowing us to
; sneak in between video cycles more often.
; o add some real error-handling. how should we do this? return a status? fault? we
; could also return info to our caller on which of the three loops got used, so they know
; they're getting the speed they want.
; o don't use the BITWIDTH routine to test for exact powers of two in MULCHK. a number X]
; is an exact power of two if (x & -x) equals x.
;
;
;
; -- end of introduction; real stuff starts here --
;
;
;
; DISSBITS SUBROUTINE
; MDS ASSEMBLER VERSION
; Converted by David E. Smith for MacTutor.
;
;
xdef dissBits
Include QuickEqu.D ; MDS toolbox equates and traps
Include SysEqu.D
Include ToolEqu.D
Include MacTraps.D
MACRO .equ = equ| ; convert Lisa assembler stuff to MDS Mac
MACRO _hidecurs = _HideCursor|
MACRO _showcurs = _ShowCursor|
;
; definitions of the "ours" record: this structure, of which there are two copies in
; our stack frame, is a sort of bitmap:
;
oRows .equ 0 ; (word) number of last row (first is 0)
oCols .equ oRows+2 ; (word) number of last column (first is 0)
oLbits .equ oCols+2 ; (word) size of left margin within 1st byte
oStride .equ oLbits+2 ; (word) stride in memory from row to row
oBase .equ oStride+2 ; (long) base address of bitmap
osize .equ oBase+4 ; size, in bytes, of "ours" record
;
; stack frame elements:
;
srcOurs .equ -osize ; (osize) our view of source bits
dstOurs .equ srcOurs-osize ; (osize) our view of target bits
sflast .equ dstOurs ; relative address of last s.f. member
sfsize .equ -sflast ; size of s.f. for LINK (must be EVEN!)
;
; parameter offsets from the stack frame pointer, A6:
; last parameter is above return address and old s.f.
;
dRptr .equ 4+4 ; ^destination rectangle
sRptr .equ dRptr+4 ; ^source rectangle
dBptr .equ sRptr+4 ; ^destination bitMap
sBptr .equ dBptr+4 ; ^source bitMap
plast .equ sBptr+4 ; address just past last parameter
psize .equ plast-dRptr ; size of parameters, in bytes
;
; entrance: set up a stack frame, save some registers, hide the cursor.
;
dissBits: ; main entry point
link A6,#-sfsize ; set up a stack frame
movem.l D3-D7/A2-A5,-(SP) ; save registers compiler may need
_hidecurs ; don't let the cursor show for now
;
; convert the source and destination bitmaps and rectangles to a format we prefer.
; we won't look at these parameters after this.
;
move.l sBptr(A6),A0 ; point to source bitMap
move.l sRptr(A6),A1 ; and source rectangle
lea srcOurs(A6),A2 ; and our source structure
bsr CONVERT ; convert to our format
move.l dBptr(A6),A0 ; point to destination bitMap
move.l dRptr(A6),A1 ; and rectangle
lea dstOurs(A6),A2 ; and our structure
bsr CONVERT ; convert to our format
;
; check that the rectangles match in size.
;
move.w srcOurs+oRows(A6),D0 ; pick up the number of rows
cmp.w dstOurs+oRows(A6),D0 ; same number of rows?
bne ERROR ; nope -- bag it
move.w srcOurs+oCols(A6),D0 ; check the number of columns
cmp.w dstOurs+oCols(A6),D0 ; same number of columns, too?
bne ERROR ; that's a bozo no-no
;
; figure the bit-width needed to span the columns, and the rows.
;
move.w srcOurs+oCols(A6),D0 ; get count of columns
ext.l D0 ; make it a longword
bsr LOG2 ; figure bit-width
move.w D0,D1 ; set aside that result
beq SMALL ; too small? wimp out and do it with copyBits
move.w srcOurs+oRows(A6),D0 ; get count of rows
ext.l D0 ; make it a longword
bsr LOG2 ; again, find the bit-width
tst.w D0 ; is the result zero?
beq SMALL ; if so, our algorithm will screw up
;
; set up various constants we'll need in the in the innermost loop
;
move.l #1,D5 ; set up...
lsl.l D1,D5 ; ...the bit mask which is...
sub.l #1,D5 ; ...bit-width (cols) 1's
add.w D1,D0 ; find total bit-width (rows plus columns)
lea TABLE,A0 ; point to the table of XOR masks
moveq #0,D3 ; clear out D3 before we fill the low byte
move.b 0(A0,D0),D3 ; grab the correct XOR mask in D3
;
; the table is saved compactly, since none of the masks are wider than a byte. we have to unpack it so
; the high-order bit of the D0-bit-wide field is on:
;
UNPACK: add.l D3,D3 ; shift left by one
bpl.s UNPACK ; keep moving until the top bit that's on is aligned at the top end
rol.l D0,D3 ; now swing the top D0 bits around to be the bottom D0 bits, the mask
move.l D3,D0 ; 1st sequence element is the mask itself
;
; do all kinds of preparation:
;
move.l srcOurs+oBase(A6),D2 ; set up base pointer for our source bits
lsl.l #3,D2 ; make it into a bit address
move.l D2,A0 ; put it where the fast loop will use it
move.w srcOurs+oLbits(A6),D2 ; now pick up source left margin
ext.l D2 ; make it a longword
add.l D2,A0 ; and make A0 useful for odd routine below
move.l dstOurs+oBase(A6),D2 ; set up base pointer for target
lsl.l #3,D2 ; again, bit addressing works out faster
move.l D2,A1 ; stuff it where we want it for the loop
move.w dstOurs+oLbits(A6),D2 ; now pick up destination left margin
ext.l D2 ; make it a longword
add.l D2,A1 ; and make A1 useful, too
move.w srcOurs+oCols(A6),A2 ; pick up the often-used count of columns
move.w srcOurs+oRows(A6),D2 ; and of rows
add.w #1,D2 ; make row count one-too-high for compares
ext.l D2 ; and make it a longword
lsl.l D1,D2 ; slide it to line up w/rows part of D0
move.l D2,A4 ; and save that somewhere useful
move.w D1,D2 ; put log2(columns) in a safe place (sigh)
;
; try to reduce the amount we shift down D2. this involves:
; halving the strides as long as each is even, decrementing D2 as we go
; masking the bottom bits off D4 when we extract the row count in the loop
;
; alas, we can't always shift as little as we want. for instance, if we don't
; shift down far enough, the row count will be so high as to exceed a halfword,
; and the dread mulu instruction won't work (it eats only word operands). so,
; we have to have an extra check to take us out of the loop early.
;
move.w srcOurs+oStride(A6),D4 ; pick up source stride
move.w dstOurs+oStride(A6),D7 ; and target stride
move.w srcOurs+oRows(A6),D1 ; pick up row count for kludgey check
tst.w D2 ; how's the bitcount?
beq.s HALFDONE ; skip out if already down to zero
HALFLOOP:
btst #0,D4 ; is this stride even?
bne.s HALFDONE ; nope -- our work here is done
btst #0,D7 ; how about this one?
bne.s HALFDONE ; have to have both even
lsl.w #1,D1 ; can we keep max row number in a halfword?
bcs.s HALFDONE ; nope -- D2 mustn't get any smaller!
lsr.w #1,D4 ; halve each stride...
lsr.w #1,D7 ; ...like this
sub.w #1,D2 ; and remember not to shift down as far
bne.s HALFLOOP ; loop unless we're down to no shift at all
HALFDONE: ; no tacky platitudes, please
move.w D4,srcOurs+oStride(A6) ; put back source stride
move.w D7,dstOurs+oStride(A6) ; and target stride
;
; make some stuff faster to access -- use the fact that (An) is faster to access
; than d(An). this means we'll misuse our frame pointer, but don't worry -- we'll
; restore it before we use it again.
;
move.w srcOurs+oStride(A6),A5 ; make source stride faster to access, too
move.l A6,-(SP) ; save framitz pointer
move.w dstOurs+oStride(A6),A6 ; pick up destination stride
move.l #0,D6 ; we do only AND.W x,D6 -- but ADD.L D6,x
clr.w -(SP) ; reserve room for function result
bsr MULCHK ; go see if strides are powers of two
tst.w (SP)+ ; can we eliminate the horrible MULUs?
bne NOMUL ; yes! hurray!
;
; main loop: map the sequence element into rows and columns, check if it's in bounds
; and skip on if it's not, flip the appropriate bit, generate the next element in the
; sequence, and loop if the sequence isn't done.
;
;
; check the row bounds. note that we can check the row before extracting it from
; D0, ignoring the bits at the bottom of D0 for the columns. to get these bits
; to be ignored, we had to make A4 one-too-high before shifting it up to align it.
;
LOOP: ; here for another time around
cmp.l A4,D0 ; is row in bounds?
bge.s NEXT ; no: clip this
;
; map it into the column; check bounds. note that we save this check for second;
; it's a little slower because of the move and mask.
;
; chuck sagely points out that when the "bhi" at the end of the loop takes, we
; know we can ignore the above comparison. thanks, chuck. you're a great guy.
;
LOOPROW: ; here when we know the row number is OK
move.w D0,D6 ; copy the sequence element
and.w D5,D6 ; find just the column number
cmp.w A2,D6 ; too far to the right? (past oCols?)
bgt.s NEXT ; yes: skip out
move.l D0,D4 ; we know element will be used; copy it
sub.w D6,D4 ; remove column's bits
lsr.l D2,D4 ; shift down to row, NOT right-justified
;
; get the source byte, and bit offset. D4 has the bit offset in rows, and
; D6 is columns.
;
move.w A5,D1 ; get the stride per row (in bits)
mulu D4,D1 ; stride * row; find source row's offset in bits
add.l D6,D1 ; add in column offset (bits)
add.l A0,D1 ; plus base of bitmap (bits [sic])
move.b D1,D7 ; save the bottom three bits for the BTST
lsr.l #3,D1 ; while we shift down to a word address
move.l D1,A3 ; and save that for the test, too
not.b D7 ; get right bit number (compute #7-D7)
;
; find the destination bit address and bit offset
;
move.w A6,D1 ; extract cunningly hidden destination stride
mulu D1,D4 ; stride*row number = dest row's offset in bits
add.l D6,D4 ; add in column bit offset
add.l A1,D4 ; and base address, also in bits
move.b D4,D6 ; set aside the bit displacement
lsr.l #3,D4 ; make a byte displacement
not.b D6 ; get right bit number (compute #7-D6)
btst D7,(A3) ; test the D7th bit of source byte
move.l D4,A3 ; point to target byte (don't lose CC from btst)
bne.s SETON ; if on, go set destination on
bclr D6,(A3) ; else clear destination bit
;
; find the next sequence element. see knuth, vol ii., page 29 for sketchy details.
;
NEXT: ; jump here if D0 not in bounds
lsr.l #1,D0 ; slide one bit to the right
bhi.s LOOPROW ; if no carry out, but not zero, loop
eor.l D3,D0 ; flip magic bits appropriate to the bitwidth we want...
cmp.l D3,D0 ; ...but has this brought us to square 1?
bne.s LOOP ; if not, loop back; else...
bra DONE ; ...we're finished
SETON:
bset D6,(A3) ; source bit was on: set destination on
; copy of above code, stolen for inline speed -- sorry.
lsr.l #1,D0 ; slide one bit to the right
bhi.s LOOPROW ; if no carry out, but not zero, loop
eor.l D3,D0 ; flip magic bits...
cmp.l D3,D0 ; ...but has this brought us to square 1?
bne.s LOOP ; if not, loop back; else fall through
;
; here when we're done; the (0,0) point has not been done yet. this is
; really the (0,left margin) point. we also jump here from another copy loop.
;
DONE:
move.l (SP)+,A6 ; restore stack frame pointer
move.w srcOurs+oLbits(A6),D0 ; pick up bit offset of left margin
move.w dstOurs+oLbits(A6),D1 ; and ditto for target
not.b D0 ; flip to number the bits for 68000
not.b D1 ; ditto
; alternate, late entrance, when SCREEN routine has already set up D0 and D1 (it doesn't want the bit
; offset negated).
DONEA: ; land here with D0, D1 set
move.l srcOurs+oBase(A6),A0 ; set up base pointer for our source bits
move.l dstOurs+oBase(A6),A1 ; and pointer for target
bset D1,(A1) ; assume source bit was on; set target
btst D0,(A0) ; was first bit of source on?
bne.s DONE2 ; yes: skip out
bclr D1,(A1) ; no: oops! set it right, and fall through
;
; return
;
DONE2: ; here when we're really done
ERROR: ; we return silently on errors
_showcurs ; let's see this again
movem.l (SP)+,D3-D7/A2-A5 ; restore lots of registers
unlk A6 ; restore caller's stack frame pointer
move.l (SP)+,A0 ; pop return address
add.l #psize,SP ; unstack parameters
jmp (A0) ; home to mother
;
; -----------------------------------------------------------------------------------
;
; sleazo code for when we're asked to dissolve very small regions. if either dimension of the rectangle
; is too small, we bag it and just delegate the problem to copyBits. a possible problem with this is if
; someone decides to substitute us for the standard copyBits routine -- this case will become recursive...
;
SMALL: ; here when it's too small to copy ourselves
move.l sBptr(A6),-(SP) ; push args: source bitmap
move.l dBptr(A6),-(SP) ; destination bitmap
move.l sRptr(A6),-(SP) ; source rectangle
move.l dRptr(A6),-(SP) ; destination rectangle
move.w #srcCopy,-(SP) ; transfer mode -- source copy
clr.l -(SP) ; mask region -- NIL
_copyBits ; do the copy in quickdraw-land
bra DONE2 ; head for home
;
; -----------------------------------------------------------------------------------
;
; code identical to the usual loop, but A5 and A6 have been changed to shift counts.
; other than that, it's the same. really it is! well, no, wait a minute...
; because we don't have to worry about the word-size mulu operands, we can collapse
; the shifts and countershifts further as shown below:
NOMUL: ; here for alternate version of loop
tst.w D2 ; is right shift zero?
beq.s NOMUL2 ; yes: can't do much more...
cmp.w #0,A5 ; how about one left shift (for source stride)?
beq.s NOMUL2 ; yes: ditto
cmp.w #0,A6 ; and the other left shift (destination stride)?
beq.s NOMUL2 ; yes: can't do much more...
sub.w #1,D2 ; all three...
sub.w #1,A5 ; ...are...
sub.w #1,A6 ; ...collapsible
bra.s NOMUL ; go see if we can go further
;
; see if we can do the super-special-case loop, which basically is equivalent to any rectangle
; where the source and destination are both exactly the width of the Mac screen.
;
NOMUL2: ; here when D2, A5, and A6 are all collapsed
tst.w D2 ; did this shift get down to zero?
bne.s NLOOP ; no: skip to first kludged loop
cmp.w #0,A5 ; is this zero?
bne.s NLOOP ; no: again, can't make further optimization
cmp.w #0,A6 ; how about this?
bne.s NLOOP ; no: the best-laid plans of mice and men...
cmp.w A2,D5 ; is there no check on the column?
bne.s NLOOP ; not a power-of-two columns; rats!
move.w A0,D6 ; grab the base address of the source
and.b #7,D6 ; select the low three bits
bne.s NLOOP ; doesn't sit on a byte boundary; phooey
move.w A1,D6 ; now try the base of the destination
and.b #7,D6 ; and select its bit offset
beq.s SCREEN ; yes! do extra-special loop!
;
; fast, but not super-fast loop, used when both source and destination bitmaps have strides which are
; powers of two.
;
NLOOP: ; here for another time around
cmp.l A4,D0 ; is row in bounds?
bge.s NNEXT ; no: clip this
NLOOPROW: ; here when we know the row number is OK
move.w D0,D6 ; copy the sequence element
and.w D5,D6 ; find just the column number
cmp.w A2,D6 ; too far to the right? (past oCols?)
bgt.s NNEXT ; yes: skip out
move.l D0,D4 ; we know element will be used; copy it
sub.w D6,D4 ; remove column's bits
lsr.l D2,D4 ; shift down to row, NOT right-justified
move.w A5,D7 ; get log2 of stride per row (in bits)
move.l D4,D1 ; make a working copy of the row number
lsl.l D7,D1 ; * stride/row is source row's offset in bits
add.l D6,D1 ; add in column offset (bits)
add.l A0,D1 ; plus base of bitmap (bits [sic])
move.b D1,D7 ; save the bottom three bits for the BTST
lsr.l #3,D1 ; while we shift down to a byte address
move.l D1,A3 ; and save that for the test, too
not.b D7 ; get right bit number (compute #7-D7)
move.w A6,D1 ; extract log2 of destination stride
lsl.l D1,D4 ; stride*row number = dest row's offset in bits
add.l D6,D4 ; add in column bit offset
add.l A1,D4 ; and base address, also in bits
move.b D4,D6 ; set aside the bit displacement
lsr.l #3,D4 ; make a byte displacement
not.b D6 ; get right bit number (compute #7-D6)
btst D7,(A3) ; test the D7th bit of source byte
move.l D4,A3 ; point to target byte (don't ruin CC from btst)
bne.s NSETON ; if on, go set destination on
bclr D6,(A3) ; else clear destination bit
NNEXT: ; jump here if D0 not in bounds
lsr.l #1,D0 ; slide one bit to the right
bhi.s NLOOPROW ; if no carry out, but not zero, loop
eor.l D3,D0 ; flip magic bits...
cmp.l D3,D0 ; ...but has this brought us to square 1?
bne.s NLOOP ; if not, loop back; else...
bra DONE ; ...we're finished
NSETON:
bset D6,(A3) ; source bit was on: set destination on
lsr.l #1,D0 ; slide one bit to the right
bhi.s NLOOPROW ; if no carry out, but not zero, loop
eor.l D3,D0 ; flip magic bits...
cmp.l D3,D0 ; ...but has this brought us to square 1?
bne.s NLOOP ; if not, loop back; else fall through
bra DONE ; and finish
;
; -----------------------------------------------------------------------------------
;
; super-special case, which happens to hold for the whole mac screen -- or subsets
; of it which are as wide as the screen. here, we've found that the shift counts
; in D2, A5, and A6 can all be collapsed to zero. and D5 equals A2, so there's
; no need to check whether D6 is in limits -- or even take it out of D0! so, this loop
; is the NLOOP code without the shifts or the check on the column number. should
; run like a bat; have you ever seen a bat run?
;
; oh, yes, one further restriction -- the addresses in A0 and A1 must point to
; integral byte addresses with no bit offset. (this still holds for full-screen
; copies.) because both the source and destination are byte-aligned, we can skip
; the ritual Negation Of The Bit Offset which the 68000 usually demands.
SCREEN: ; here to set up to do the whole screen, or at least its width
move.l A0,D6 ; take the base source address...
lsr.l #3,D6 ; ... and make it a byte address
move.l D6,A0 ; replace pointer
move.l A1,D6 ; now do the same...
lsr.l #3,D6 ; ...for...
move.l D6,A1 ; ...the destination address
bra.s N2LOOP ; jump into loop
N2HEAD: ; here when we shifted and a bit carried out
eor.l D3,D0 ; flip magic bits to make the sequence work
N2LOOP: ; here for another time around
cmp.l A4,D0 ; is row in bounds?
bge.s N2NEXT ; no: clip this
N2LOOPROW: ; here when we know the row number is OK
move.l D0,D1 ; copy row number, shifted up, plus column offset
lsr.l #3,D1 ; while we shift down to a word offset
btst D0,0(A0,D1) ; test bit of source byte
bne.s N2SETON ; if on, go set destination on
bclr D0,0(A1,D1) ; else clear destination bit
N2NEXT: ; jump here if D0 not in bounds
lsr.l #1,D0 ; slide one bit to the right
bhi.s N2LOOPROW ; if no carry out, but not zero, loop
bne.s N2HEAD ; if carry out, but not zero, loop earlier
bra.s N2DONE ; 0 means next sequence element would have been D3
N2SETON:
bset D0,0(A1,D1) ; source bit was on: set destination on
lsr.l #1,D0 ; slide one bit to the right
bhi.s N2LOOPROW ; if no carry out, but not zero, loop
bne.s N2HEAD ; if carry out, but not zero, loop earlier
; zero means the loop has closed on itself
;
; because our bit-numbering isn't like that of the other two loops, we set up D0 and D1
; ourselves before joining a bit late with the common code to get the last bit.
;
N2DONE:
move.l (SP)+,A6 ; restore the stack frame pointer
move.w srcOurs+oLbits(A6),D0 ; pick up bit offset of left margin
move.w dstOurs+oLbits(A6),D1 ; and ditto for target
bra DONEA ; go do the first bit, which the sequence doesn't cover
;
; -----------------------------------------------------------------------------------
;
; mulchk -- see if we can do without multiply instructions.
;
; calling sequence:
; A5 holds the source stride
; A6 holds the destination stride
; clr.w -(SP) ; reserve room for boolean function return
; bsr MULCHK ; go check things out
; tst.w (SP)+ ; test result
; bne.s SHIFT ; if non-zero, we can shift and not multiply
;
; (if we can shift, A5 and A6 have been turned into shift counts)
;
; registers used: none (A5, A6)
MULCHK:
movem.l D0-D3,-(SP) ; stack caller's registers
move.l A5,D0 ; take the source stride
bsr BITWIDTH ; take log base 2
move.l #1,D1 ; pick up a one...
lsl.l D0,D1 ; ...and try to recreate the stride
cmp.l A5,D1 ; does it come out the same?
bne.s NOMULCHK ; nope -- bag it
move.w D0,D3 ; save magic logarithm of source stride
move.l A6,D0 ; yes -- now how about destination stride?
bsr BITWIDTH ; convert that one, also
move.l #1,D1 ; again, try a single bit...
lsl.l D0,D1 ; ...and see if original # was 1 bit
cmp.l A6,D1 ; how'd it come out?
bne.s NOMULCHK ; doesn't match -- bag this
;
; we can shift instead of multiplying. change address registers & tell our caller.
;
move.w D3,A5 ; set up shift for source stride
move.w D0,A6 ; and for destination stride
st 4+16(SP) ; tell our caller what's what
bra.s MULRET ; and return
NOMULCHK:
sf 4+16(SP) ; tell caller we can't optimize
MULRET: ; here to return; result set
movem.l (SP)+,D0-D3 ; pop some registers
rts ; all set
;
; -----------------------------------------------------------------------------------
;
; table of (longword) masks to XOR in strange Knuthian algorithm. the first table
; entry is for a bit-width of two, so the table actually starts two bytes before
; that. hardware jocks among you may recognize this scheme as the software analog
; of a "maximum-length sequence generator".
;
; to save a bit of room, masks are packed in bytes, but should be aligned as
; described in the code before being used.
;
.ALIGN 2
table: DC.B 0,0 ; first element is #2
DC.B 3 ; 2
DC.B 3 ; 3
DC.B 3 ; 4
DC.B 5 ; 5
DC.B 3 ; 6
DC.B 3 ; 7
DC.B 23 ; 8
DC.B 17 ; 9
DC.B 9 ; 10
DC.B 5 ; 11
DC.B 101 ; 12
DC.B 27 ; 13
DC.B 53 ; 14
DC.B 3 ; 15
DC.B 45 ; 16
DC.B 9 ; 17
DC.B 129 ; 18
DC.B 57 ; 19
DC.B 9 ; 20
DC.B 5 ; 21
DC.B 3 ; 22
DC.B 33 ; 23
DC.B 27 ; 24
DC.B 9 ; 25
DC.B 113 ; 26
DC.B 57 ; 27
DC.B 9 ; 28
DC.B 5 ; 29
DC.B 101 ; 30
DC.B 9 ; 31
DC.B 163 ; 32
.align 2
;
; -----------------------------------------------------------------------------------
;
; convert -- convert a parameter bitMap and rectangle to our internal form.
;
; calling sequence:
; lea bitMap,A0 ; point to the bitmap
; lea rect,A1 ; and the rectangle inside it
; lea ours,A2 ; and our data structure
; bsr CONVERT ; call us
;
; when done, all fields of the "ours" structure are filled in:
; oBase is the address of the first byte in which any bits are to be changed
; oLbits is the number of bits into that first byte which are ignored
; oStride is the stride from one row to the next, in bits
; oCols is the number of columns in the rectangle
; oRows is the number of rows
;
; registers used: D0, D1, D2
;
CONVERT:
;
; save the starting word and bit address of the stuff:
;
move.w top(A1),D0 ; pick up top of inner rectangle
sub.w bounds+top(A0),D0 ; figure rows to skip within bitmap
mulu rowbytes(A0),D0 ; compute bytes to skip (relative offset)
add.l baseaddr(A0),D0 ; find absolute address of first row to use
move.w left(A1),D1 ; pick up left coordinate of inner rect
sub.w bounds+left(A0),D1 ; find columns to skip
move.w D1,D2 ; copy that
and.w #7,D2 ; compute bits to skip in first byte
move.w D2,oLbits(A2) ; save that in the structure
lsr.w #3,D1 ; convert column count from bits to bytes
ext.l D1 ; convert to a long value, so we can...
add.l D1,D0 ; add to row start in bitmap to find 1st byte
move.l D0,oBase(A2) ; save that in the structure
;
; save the stride of the bitmap; this is the same as for the original, but in bits.
;
move.w rowbytes(A0),D0 ; pick up the stride
lsl.w #3,D0 ; multiply by eight to get a bit stride
move.w D0,oStride(A2) ; stick it in the target structure
;
; save the number of rows and columns.
;
move.w bottom(A1),D0 ; get the bottom of the rectangle
sub.w top(A1),D0 ; less the top coordinate
sub.w #1,D0 ; get number of highest row (1st is zero)
bmi.s CERROR ; nothing to do? (note: 0 IS ok)
move.w D0,oRows(A2); ; save that in the structure
move.w right(A1),D0 ; get the right edge of the rectangle
sub.w left(A1),D0 ; less the left coordinate
sub.w #1,D0 ; make it zero-based
bmi CERROR ; nothing to do here?
move.w D0,oCols(A2) ; save that in the structure
;
; all done. return.
;
rts
;
; error found in CONVERT. pop return and jump to the error routine, such as it is.
;
CERROR:
addq.l #4,SP ; pop four bytes of return address.
bra ERROR ; return silently
;
; -----------------------------------------------------------------------------------
;
; log2 -- find the ceiling of the log, base 2, of a number.
; bitwidth -- find how many bits wide a number is
;
; calling sequence:
; move.l N,D0 ; store the number in D0
; bsr LOG2 ; call us
; move.w D0,... ; D0 contains the word result
;
; registers used: D2, (D0)
;
BITWIDTH:
sub.l #1,D0 ; so 2**n works right (sigh)
LOG2:
tst.l D0 ; did they pass us a zero?
beq.s LOGDONE ; call log2(0) zero -- what the heck...
beq.s LOGDONE ; if D0 was one, answer is zero
move.w #32,D2 ; initialize count
LOG2LP:
lsl.l #1,D0 ; slide bits to the left by one
dbcs D2,LOG2LP ; decrement and loop until a bit falls off
move.w D2,D0 ; else save our value where we promised it
LOGDONE: ; here with final value in D0
rts ; and return
END ; procedure dissBits